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HIGHLIGHTS 


• We utilize the low-frequency characteristics of FIR images for ROI generation. 

• We generate ROIs based on combining image segments instead of using the intensity threshold. 

• It generates a small number of ROIs at an acceptable miss rate. 

• We introduce miss rate and ROIs per image (ROIPI) to assess the performance of the segment-based ROI generation module. 
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We present a region of interest (ROI) generation method specialized for nighttime pedestrian detection 
using far-infrared (FIR) images. Because pedestrians typically appear brighter than background in FIR 
images, previous research efforts primarily attempted to extract ROIs based on the intensity threshold. 
However this approach has problems resulting from the intensity variances of pedestrians due to their 
clothing and, especially in urban scenarios, and other heat sources that emit more heat than the pedes¬ 
trians. In this paper, we propose a novel ROI generation method that is based on combining image seg¬ 
ments instead of using the intensity threshold. In order to minimize dependence on brightness, we utilize 
the low-frequency characteristics of FIR images. As a result, our proposed method generates a small num¬ 
ber of ROIs at an acceptable miss rate and the generated ROIs provide advantages for classification 
because the pedestrians are satisfactorily arranged within a bounding box. Experiments conducted indi¬ 
cate that our proposed method performs reliably in urban scenarios. 

© 2013 Elsevier B.V. All rights reserved. 


1. Introduction 

Far-infrared (FIR) images, unlike visible spectrum images, cap¬ 
ture the heat emitted from objects, so pedestrians typically appear 
brighter (i.e., they have a relatively higher temperature) than back¬ 
grounds. Thus, FIR technology is very suitable for pedestrian detec¬ 
tion at night time, when a visible spectrum camera is not usable. 
Generally, the first step in pedestrian detection is the extraction 
of ROIs that are highly likely to contain pedestrians. The threshold 
technique is very useful in detecting warm objects that are likely to 
be pedestrians, so the majority of earlier research efforts in this 
area focused on generation of a ROI based on intensity threshold. 

Global image threshold is the simplest technique for extracting 
hotspots. In [1], a simple intensity threshold is chosen experimen¬ 
tally, whereas in [2], the threshold is determined based on the 
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overall mean of the image, and in [3], the image threshold is set 
using the mean and maximum intensity values in the current 
frame. However, these approaches encounter problems when they 
attempt to determine suitable threshold values. These problems 
arise from the non-uniform intensities of pedestrians caused by 
their clothes, multiple neighboring pedestrians, and other objects, 
such as cars and electric signs being warmer than the pedestrians. 

In order to compensate for these limitations, additional steps 
such as region growing and active contours are used for generating 
potential candidates. In [4], a seed point is extracted using a high 
threshold that is near the maximum image intensity, then the 
seeded region growing method is used to generate candidate 
images. In [5], an active contour model is used on edge images to 
extract the boundaries between potential candidates and their 
backgrounds. 

Instead of using each pixel value to extract hotspots, 
one-dimensional horizontal and vertical image intensity profiles 
are used to generate candidate bounding boxes. In [6], vertical 
strips are extracted by the local minima of the horizontal intensity 
profile generated using brightness threshold, and then vertically 
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segmented according to brightness and bodyline estimation. Simi¬ 
larly, in [7], a thresholded horizontal and vertical image intensity 
profile is used to find intersections between the vertical and hori¬ 
zontal boundaries that define the bright region. Symmetry of 
intensities, symmetry of vertical edges, and density of vertical 
edges on a horizontal profile are used to generate candidates in [8]. 

Although these methods work well, they all require a threshold 
value to extract bright pixels. Thus, the threshold value still signif¬ 
icantly affects the quality of the ROIs. 

In addition to pixel brightness, FIR images are characterized by 
low frequency, a very important characteristic that had not been 
considered in previous works. Temperature changes are not signif¬ 
icant in the same object and FIR is not significantly affected by the 
texture of an object’s surface, so the output of an FIR camera is a 
low-frequency image containing regions with a similar intensity 
unlike visible spectrum images. These attributes are well illus¬ 
trated in Fig. 1. Due to this characteristic, FIR images facilitate to 
segment an image into regions of similar temperatures. 

In this paper, we present a novel ROI generation method for FIR- 
camera-based pedestrian detection systems. Image segmentation is 
used for extracting segments of pedestrian candidates, and ROIs are 
generated by combining the segments with commonly used geo¬ 
metric constraints. We evaluate the performance of the proposed 
ROI generation module itself, and its pedestrian detection results 
on FIR image sequences using the single frame evaluation method. 

2. The proposed ROI generation method 

ROI generation is the process of finding regions that are highly 
likely to contain a pedestrian. In FIR images, pedestrians appear 
brighter than backgrounds because FIR cameras capture the heat 
emitted from objects. Hence, most of the previous approaches set 
their start point as the intensity threshold to utilize this character¬ 
istic instead of using the sliding windows technique, which is 
frequently used in visible-spectrum-camera-based pedestrian 
detection. 

As mentioned in [4], the FIR signature of pedestrians can be 
distorted because of their clothing. As a result, pedestrians may 
appear as several regions of similar intensities rather than a single 
region of uniform intensity. This results in difficulties when 
extracting pedestrian segments using intensity threshold, so we 
do not consider bright pixels but regions of similar intensities 
instead. In our work, instead of beginning with an intensity thresh¬ 
old, we start the module using image segmentation. 

2.1. Image segmentation 

To generate an ROI we do not seek hot spots based on an inten¬ 
sity threshold; instead, we use segments composed of pixels with 


similar intensities. Although a pedestrian image does not have uni¬ 
form intensity, as expected in FIR images, it is divided into several 
regions that have a similar intensity. The idea underlying our ap¬ 
proach stems from this characteristic. Because of this characteris¬ 
tic, intensity threshold based methods experience difficulty 
finding suitable thresholds: a threshold that is too high causes 
the pedestrian candidate to be split into several blobs, whereas 
one that is too low distorts the pedestrian candidate by dispersing 
the background pixels. To overcome these problems, previous 
works utilized post processing techniques such as region growing 
and active contours to find the boundary between pedestrians 
and their backgrounds. We solve these problems in somewhat of 
a reverse order. To obtain the boundaries between pedestrians 
and their backgrounds, instead of spreading from the seed point 
that is extracted considering the intensity threshold, our approach 
starts with image segmentation, because FIR images facilitate im¬ 
age segmentation and enables the extraction of foreground heat 
source objects from their backgrounds. By applying image segmen¬ 
tation to an FIR image, although the pedestrian image is split into 
several segments, they are differentiated from their background. 
Thus, through the proper combination of those segments, ROIs 
can be extracted effectively. Fig. 2 shows several image segmenta¬ 
tion results. 

We use a mean shift segmentation algorithm to extract seg¬ 
ments because the results produced using this algorithm are read¬ 
ily controlled using a few parameters and this method also 
provides reliable performance. The mean shift algorithm is a mode 
seeking algorithm that was made popular for image segmentation 
by Comaniciu and Meer [9]. To generate results, users only need to 
set the parameters for the spatial and range resolutions (h s ,h r ), 
which control the size and number of segments, while options such 
as the minimum size of a segment (M) can be used to eliminate 
small segments. Suitable parameters can be determined experi¬ 
mentally to extract pedestrian segments. Clustering of pixels is 
performed as the parameters h s in the spatial domain and h r in 
the range domain; thus a smaller value for each parameter tends 
to generate segments with smaller sizes. Fig. 3 shows the results 
of parameter variations. As shown in the figure, the range param¬ 
eter h r controls the number of regions: because the neighboring 
pixels are grouped together in the range domains (i.e. gray-levels 
in our case), as it becomes larger, the number of segments is re¬ 
duced, and the boundary between pedestrians and their back¬ 
ground may become blurred. Hence, pedestrians may be merged 
into the background. A similar effect occurs when the spatial 
parameter increase: because distant pixels are grouped together 
in the spatial domain, as spatial parameter h s becomes larger, the 
boundaries between object and background tend to become 
blurred. To determine the parameters, we evaluated ROI genera¬ 
tion performance versus parameter variations using miss rate 



Fig. 1 . Sample images of pedestrians at a crosswalk, (a) Visible spectrum image (640 x 480). (b) Far-infrared image (320 x 240). 
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Fig. 2. Images resulting from image segmentation, (a-c) Left column shows the FIR camera input image and the right column shows the result of image segmentation 
(segmented with parameters ( h s ,h r ,M ) = (9,5,30)). 


and ROIs per image. The details of the measurements used are 
given in Section 3.1. In accordance with the results of the evalua¬ 
tions, we set parameters (h s ,h n M) to (9,5,30), as shown in 
Fig. 3b. The output of mean shift segmentation is a labeled image 
of segments. These segments are used to extract ROIs by constraint 
combinations. 

2.2. Segment-based ROI generation 

As shown in the image segmentation results, pedestrians are 
usually split into several segments rather than appearing as a sin¬ 
gle segment because of differences in heat emission from parts of 
their bodies, their clothes, bags, and other such accessories. Hence, 
we consider as candidates not only single segments but also com¬ 
binations of pairs of segments. With this kind of combination, we 
look for a pair of segments comprising head and leg to generate 
a bounding box by connecting the two segments. 

A similar segmentation-based candidate selection method is 
utilized in [10], for visible spectrum images. The method utilized 
generates candidates using only single segments in a segmented 
image of multiple scales, not pairs of segments, in order to find 
unbroken objects by scale variations. However, the method is time 
consuming and it is also difficult to acquire single unbroken 


candidates because edges are not strong in FIR images. As the tar¬ 
gets of our applications are pedestrians on roads, the combinations 
can be restricted by combining vertically ordered segments. This 
makes the combination solution very simple. 

Since tens of segments of FIR images exist as a result of image 
segmentation, it is not rational to consider all the possible combi¬ 
nations of pairs of segments for ROI generation. To reduce the 
number of possible combinations, only relatively bright segments 
are considered for combination as those having a higher probabil¬ 
ity of being a part of a pedestrian. Thus, only segments that are 
brighter than the background segment (which is selected accord¬ 
ing to the largest one occupying the lowermost rows due to the fact 
that the ground is merged into a single segment, as the ground re¬ 
gion has uniform intensity) are considered for combination. Fig. 4a 
shows an example of the extraction of segments that is brighter 
than the background segment. 

The horizontal overlap ratio (defined as the ratio of the intersec¬ 
tion and the union) between segments is the rule used to combine 
pairs of segments because pedestrians are perpendicular to the 
ground and pedestrian segments are also separated horizontally 
(for example, into head, body, and legs). The horizontal overlap 
ratio (s 0 ) is calculated between segments ( seg a ,seg b ) with their 
horizontal length ( length h ) as: 
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Fig. 3. Images resulting from spatial and range resolution parameter variations, (a) Input image, (b) Segmented image with (h s ,h r ,M) = (9,5,30) which was used for our 
detection system, (c) (h s ,h r ,M) = (9,1,30). (d) (h s ,h r ,M) = (9,20,30). (e) (h s ,h r ,M) = (3,5,30). (f) (h s ,h r ,M) = (20,5,30). 


s _ length,, (seg a nseg„) 

0 length h (seg a useg b ) 

In our experiments, pairs of segments in which the horizontal 
overlap exceeded 30% (which was determined experimentally) 
were assigned to the candidate and its bounding box (defined as 
a rectangle with the left-uppermost coordinate and the right-low¬ 
ermost coordinate determined by the coordinates of the pairs of 
segments) is generated as illustrated in Fig. 4b, with segments of 
the pedestrian. Fig. 4c shows the candidate images in full. 

At this point, geometrical constraints are generally used to filter 
out impractical candidates such as candidates that are too far away 
and candidates that are too big or too small for that distance. This 
procedure can reduce the number of ROIs needing classification 
and can also reduce false negatives. The constraints that we used 
were flat world assumption, ground-plane-based objects, and the 
height of the pedestrians. Fig. 5 illustrates the constraints that were 
placed on the candidates. In our work, only pedestrians with a 
height of 1-2 m within a distance of 50 m were considered targets. 

In the cases where the constraints were met, the width of the 
bounding box was changed to fix an aspect ratio (defined as the ra¬ 
tio of the width to the height) of 0.5. The final result of ROI gener¬ 
ation is shown in Fig. 6. As shown in the figure, a small number of 
ROIs containing pedestrians were successfully extracted. 


3. Results and discussion 

In this section, we introduce miss rate and ROIs per image (ROI- 
PI) to assess the performance of the segment-based ROI generation 
module. We then evaluate pedestrian detection performance by 
extracting features using histograms of local intensity differences 
(HLID) from the ROIs and classify using linear support vector ma¬ 
chine (SVM). Test images amounting to 6573 frames were taken 
from a moving vehicle in an urban area at nighttime. 

3.1. Evaluation of the segment-based ROI generation method 

We evaluated our proposed ROI generation method using 1905 
of the 6573 images, excluding the images used for classifier train¬ 
ing. The image sequences included 1480 pedestrians with an image 
size (height in pixels) ranging from 33 to 179 pixels that were 
manually labeled with the ground truth by bounding boxes, the as¬ 
pect ratio of which were the same as the bounding boxes of the 
ROI. We followed the single frame evaluation method used by Dol¬ 
lar et al. [11]. Evaluations were performed between the generated 
bounding box of the ROI ( BB RO i ) and the ground truth bounding box 
(BB gt ). The matching condition between a BB RO i and a BB GT is de¬ 
fined as the area of overlap (a 0 ): 





124 


D.S. Kim, K.H. Lee/Infrared Physics Of Technology 61 (2013) 120-128 



Fig. 4. Procedure used for segment-based candidate generation, (a) Bright segment extraction for combination (all the non-black segments are considered prime segments for 
combination), (b) Examples of candidate generation. Left: pedestrians are split into five segments that are numbered for illustration. Right: single segments are the candidate 
itself and a pair of segments can be candidates if they satisfy the horizontal overlap ratio. (From upper-left, candidates are generated in segments of {1}, {1,2}, {1,3}, {1,4}, 
{1,5}, {2}, {2,3}, etc.), (c) The 498 candidates generated from the overlapped pairs of segments in the current frame. 



Fig. 5. Geometric constraints for extracting practical candidates (each red box has a 
height of 2 m and each blue box has a height of 1 m at each distance, numbers on 
the right side indicate the distance in meters). (For interpretation of the references 
to color in this figure legend, the reader is referred to the web version of this 
article.) 


area(BB R0I n BB G t) > q 5 
area(BB RO i u BBqt ) 


( 2 ) 


The condition states that their area of overlap must exceed 50% 
(we followed this parameter as in [11]). Under this condition, an 
unmatched BB GT is counted as a false negative whereas an un¬ 
matched BB R oi is counted as a false positive. 

The main objective of ROI generation is to accurately capture 
the ROIs that contain pedestrians without missing any candidates 
using a small number of ROIs as possible. Hence, we computed 
miss rate and ROIs per image (ROIPI) to determine the efficiency 
as well as the accuracy of the ROI generation module. The results 
are illustrated in Fig. 7 as mean shift parameters variations. To 



Fig. 6. Final result of ROI generation (27 ROIs generated). 


show the correlation between the ROI performance of the result 
and detection performance, we tested the detection performance 
of representative cases among the variations as shown in Table 1. 
The results are shown in Fig. 8 . The details of the detection meth¬ 
ods experiments are described in Section 3.2. From the results, we 
can conclude that, as we expected, not only miss rate but also 
ROIPI have to remain small in order to achieve better detection 
performance, because the miss rate of ROI generation module af¬ 
fects the miss rate of the overall system and a large ROIPI increases 
the number of false positives. 

It is difficult to compare our results with those of other 
researchers because this method of evaluating ROI generation 
methods has not been done before. The conventional evaluation 
method used by researchers is to assess the performance using fi¬ 
nal detection results combined with the classification module. 
However, this method of evaluating the ROI generation module is 
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Fig. 7. Performance measures using miss rate versus ROIPI of our proposed ROI generation method as mean shift parameter ( h s ,h r ,M ) variations. 


Table 1 

Performance measures of our proposed ROI generation method as mean shift 
parameter variations. 


(h s ,h n M) 

ROIPI 

Miss rate (%) 

(3,3,30) 

66.84 

0.88 

(9,3,30) 

32.97 

1.15 

(9,5,30) 

19.92 

1.95 

(9,7,30) 

15.61 

3.38 

(9,9,30) 

13.99 

5.20 


confusing because pedestrian detection results are affected not 
only by the ROI generation module but also by the choice of fea¬ 
tures, classifiers, and techniques used to merge multiple detec¬ 
tions. Using the sliding window technique, however, it may be 
possible to show that our results are meaningful because it is a 
well-known ROI generation method. To generate ROIs using the 
sliding window technique, we adapted the geometric constraints 
to reduce the number of ROIs. Further, we adopted the intensity 
threshold to each window to retain bright regions that are highly 


Table 2 

Performance measures of the intensity threshold-based sliding window technique as 
sliding resolution variations. 


Longitudinal resol. (m) 

Lateral resol. (m) 

ROIPI 

Miss rate (%) 

1.5 

0.3 

136.4 

1.8 

1.0 

0.2 

310.9 

0.47 

0.5 

0.1 

1248.2 

0.13 


Table 3 

Test parameters for HLID. 


Parameter 

Value 

Sample size 

24 x 48 pixels 

Cell size 

6x6 pixels 

No. of directions 

8-bin 

Radius 

2 

Block size 

2x2 cell 

Overlap 

0.5 block 

Normalization method 

2-norm 



Fig. 8. Detection performance comparison results as mean shift parameter variations (the legend gives the order of performance by the miss rate at 10 ^PPI). 
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Fig. 9. Segment-based ROI generation and intensity threshold-based sliding window technique pedestrian detection evaluation results (the legend gives the order of 
performance by the miss rate at 10 _1 FPPI). 



Fig. 10. Detection examples using representative scenarios (the red boxes indicate the detection results while the blue dotted boxes indicate the generated ROIs). (a and b) 
Pedestrian with hot spots, (c and d) Multiple pedestrians close to each other, (e) A pedestrian that is very near (f) Small pedestrians are also included. (For interpretation of the 
references to color in this figure legend, the reader is referred to the web version of this article.) 
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likely to contain a pedestrian, which is the start point set by most 
researchers in previous approaches. In this way, we indirectly com¬ 
pared our method to the intensity threshold-based technique. The 
threshold value was set optimally for test image sequences and 
approximately 90% of the ROIs were filtered out. The results are 
shown in Table 2 with sliding resolution variations. If the ROIPI 
is not considered, the miss rate appears to be better than that of 
our method in cases where the sliding ratio is dense. However, 
the quality of the ROIs is worse than those obtained using our 
method. The bounding boxes obtained using the sliding window 
technique do not include well-arranged candidates because they 
are obtained with a fixed size and in a fixed position. This makes 
it difficult to generate the right answers from a classifier that is 
trained using properly arranged samples. To compensate for this 
problem, more dense sliding resolutions or bounding boxes of var¬ 
ious heights can be used. However, these types of solutions will 
significantly increase the number of ROI. Further, the large number 
of ROIs increases the number of false positives and degrades the 
detection performance. 

While the results of our proposed method vary according to the 
candidates, the candidates are suitably arranged in the bounding 
box due to the boundaries between the objects, and the back¬ 
grounds are well defined as segments by image segmentation. This 
characteristic increases the detection performance of the classifier. 

In the best-case scenario, (h s ,/i r ,M) = (9,5,30), the results from 
our proposed method generated only approximately 20 ROIs per 
frame with a reasonable miss rate. False negatives occurred mainly 
when the pedestrian segments merged into overlapping objects 
that emitted heat with a similar intensity, such as vehicles. This 
is a common limitation of FIR-based detection systems and it needs 
to be rectified in future work. 

3.2. Evaluation of the pedestrian detection performance of the 
proposed ROI generation method 

To evaluate the pedestrian detection performance of our pro¬ 
posed ROI generation method, we extracted the HLID feature 


[12], a modified version of histograms of oriented gradients 
(HOG) [13] used to specify FIR images, from the ROIs. HLID ex¬ 
presses a sample image on the basis of its local shape and appear¬ 
ance using histograms of local intensity differences. The 
parameters used to extract the features are shown in Table 3. 
The features were classified using a linear SVM. 

After classification, the pair wise max (PM) suppression tech¬ 
nique [14], which discards every pair detected having a low confi¬ 
dence, was used to merge multiple detections of a pedestrian. The 
matching condition between the detection and the ground truth 
was the same as that used for the ROI evaluation in Eq. (2) except 
for the fact that bounding box of detected ( BB DT ) was used instead 
of BB R oi. The performance of the pedestrian detection is demon¬ 
strated using the detection error tradeoff (DET) curve, which plots 
the miss rate against the false positives per image (FPPI) rate on a 
log-log scale. 

We compared our result to that of the intensity threshold-based 
sliding window technique described in Section 3.1. The result is 
shown in Fig. 9. Even though the sliding window technique has a 
lower miss rate than our method, our method still outperformed 
it. If the arrangement of candidates in the captured ROI is not taken 
into consideration, these results are to be expected as the sliding 
window method generates many more ROIs than our method. 
However, as we indicated, it is important that both the miss rate 
and the ROIPI remain small. Because test sequences were captured 
in urban areas, in addition to pedestrians a lot of other heat sources 
exist. This causes many scattered hot pixels in thresholded images. 
Hence, a large number of ROIs are generated in the threshold- 
based sliding window technique due to those hot regions. How¬ 
ever, our proposed method still generated only a small number 
of ROIs under these conditions-which accounts for the differences 
in detection performance results. 

In addition, the sliding window technique results show that 
well aligned ROIs are also important for better performance be¬ 
cause a dense sliding ratio results in better performance even 
though many more ROIs are generated. Therefore, the effects of 
arrangement need to be considered in future work. 



Fig. 11. Limitations of the FIR-based system, (a) Detection failure because of similar intensity as other objects, (b) Unstable detections due to pedestrians overlapping each 
other to a large extent. 
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Representative results illustrating the advantages of the seg¬ 
ment-based ROI generation method and the limitations of FIR- 
based pedestrian detection are shown in Figs. 10 and 11. Our pro¬ 
posed method was not affected by the presence of objects that 
were warmer than the pedestrians because they were separated 
into different segments (Figs. 10a and b). This is in direct contrast 
to the intensity threshold techniques, which are severely affected 
by these hot spots. When multiple pedestrians walked close to 
each other in the street scenarios, the pedestrians were detected 
separately (Figs. 10c and d). In cases where pedestrians were near, 
they were split into more segments than those that were in the 
medium or far distance, and were detected successfully 
(Fig. lOe). Further, small pedestrians over 1 m in height were de¬ 
tected successfully, as shown in Fig. lOf. However, when the inten¬ 
sity of the pedestrian was similar to other overlapping objects or 
when pedestrians overlapped each other to a large extent, it was 
still difficult to detect them. In these cases, we expect that addi¬ 
tional algorithms such as component-based classification [15 or 
tracking will prove helpful in improving the detection performance 
(Fig. 11). 

4. Conclusion 

We proposed a novel ROI generation method for pedestrian 
classification based on FIR images. Experimental results show that 
the proposed method has reliable performance in the urban sce¬ 
nario without relying on the intensity threshold. This novel ap¬ 
proach utilizes the low-frequency characteristic, which is a 
salient characteristic of FIR images. Further, our proposed method 
generates a small number of ROIs at an acceptable miss rate and 
the generated ROIs provide advantages for classification because 
the pedestrians are satisfactorily arranged within a bounding 
box. Further, the small number of ROIs is advantageous for real 
time implementation of the systems. 
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